Rethinking PostgreSQL Performance in the Age of Monster Hardware

PGconf.dev
2025-05-15

Image par nanoslavic de Pixabay
logo EDB

NUMA and Sub-NUMA

  • Non-Uniform Memory Access
  • Increased latency
  • PostgreSQL isn't NUMA-aware
  • There are performance implications
Image by AS Photograpy from Pixabay
logo EDB

What is NUMA?

  • Each CPU socket has its own local memory
  • Each CPU can access local or remote memory
  • Access time varies (2 times slower for remote)
  • Modern servers can have 2 NUMA nodes
logo EDB

NUMA Architecture Visualization

a schema describing the NUMA architecture with 2 NUMA nodes and shared buffers spread offer both nodes
logo EDB

NUMA Architecture Visualization

a schema describing the NUMA architecture with 2 NUMA nodes and shared buffers spread offer both nodes
logo EDB

What is Sub-NUMA?

  • Division within a CPU socket
  • L3 Cache memory is shared with several cores
  • L3 Cache is divided in sub-NUMA zones
  • Adds another layer NUMA-like problem
Image by andreas N from Pixabay
logo EDB

A modern CPU

a modern CPU with cores, L2 cache and L3 cache
logo EDB

How the others do it

a timeline showing Oracle, SQL Server and DB2 NUMA awareness
  • Oracle: NUMA-aware memory and processes
  • SQL Server: Soft-NUMA
  • DB2: full NUMA support
  • MariaDB and PostgreSQL: None πŸ€·β€β™€οΈ
Image by 00luvicecream from Pixabay
logo EDB

Current PostgreSQL Workarounds

  • Using numactl
  • Using systemd with CPUAffinity, NUMAPolicy, NUMAMask
  • Limiting connections per instance
  • Reducing shared_buffers
  • Using only one socket per host
Image by Lætitia Avrot from Montréal
logo EDB

Current PostgreSQL Workarounds

# Example: Start PostgreSQL bound to NUMA node 0
numactl --cpunodebind=0 --membind=0 -- pg_ctl start

# Example: Start PostgreSQL with balancing NUMA node usage on all nodes
numactl --interleave=all -- pg_ctl start
Image by Franck Barske from Pixabay
logo EDB

Partitioned Shared Buffers

  • Divide shared_buffers
  • Allocate the "right" buffers
  • Coordinate buffer management across partitions
  • Balance memory usage across nodes
Image by EHEHmiv from Pixabay
logo EDB

Implementation Challenges

  • Significant changes
  • Complex coordination between partitions
  • Maintaining buffer replacement policy
  • Risk of introducing instability
  • Testing complexity
  • Backward compatibility concerns
Image by Pexels from Pixabay
logo EDB

Is NUMA Worth It?

  • AMD: stop sub-NUMA optimization
  • Modern CPUs have 386+ threads
  • Interconnect speeds are improving
  • Code complexity vs. performance gain
Image by andreas N from Pixabay
logo EDB

NUMA Observability

  • New in Postgres 18
  • In pg_buffercache
  • Integration with libnuma
  • Future optimization capabilities
Image by StockSnap from Pixabay
logo EDB

NUMA Observability in PG18

-- Identifying buffer distribution across NUMA nodes
select page_num,node_id,count(*)
from pg_buffercache_numa
group by page_num,node_id
order by 1
limit 3;

page_num | node_id | count
----------+---------+-------
        0 |       1 |  1024
        1 |       0 |  1024
        2 |       1 |  1024
Image by StockSnap from Pixabay
logo EDB

Thread versus Processes

  • Multi-process architecture
  • Each connection forks postnaster
  • Modern hardware challenges this
  • Threading models = better performance ?
  • A lot of community discussions
Image by Bruno from Pixabay
logo EDB

Process Model Limitations

  • Each connection creates a process
  • Higher overhead
  • Sharing memory difficulty
  • Connection pooling mandatory
Image by Steve Buissinne from Pixabay
logo EDB

Other Databases

  • Oracle: Dedicated or shared server processes
  • SQL Server: Thread pool model
  • MariaDB: Thread-per-connection
Image by Pexels from Pixabay
logo EDB

Current Community Discussions

  • Heikki Linnakangas' talk
  • Active wiki page
  • Thomas Monroe's talk
Image by Bob Dmyt from Pixabay
logo EDB

Community Consensus

Multi-threaded programming model would be superior

Robert Haas, hackers mailing-list, 2024-06-05

  • Performance improvements
  • New features
  • Slow transition
Image by StockSnap from Pixabay
logo EDB

Technical Challenges

  • State management across threads
  • Thread-local storage for session state
  • Synchronization primitives and locking
  • Signal handling in threaded environment
  • Memory management and allocation
  • Build-time vs. runtime configuration options
logo EDB

Potential Performance Benefits

  • Lower memory overhead
  • Reduced context switching costs
  • More efficient cache utilization
  • Reduced latency for short transactions
  • More efficient on monster hardware
  • No more need for pooler?
Image by JenΕ‘ SzabΓ³ from Pixabay
logo EDB

Is It Worth it?

  • I/O is the bottlenecks
  • Storage technology is slow
  • CPU and RAM are fast
  • what's the best benefits/efforts ratio?
Image by JenΕ‘ SzabΓ³ from Pixabay
logo EDB

Real Bottlenecks on Monster Hardware

  • Storage I/O bottleneck
  • Network bandwidth
  • Lock contention
  • Memory bandwidth
  • Cache coherency overhead with many cores
Image by Manfred Richter from Pixabay
logo EDB

Performance Hierarchy


  Access Times (log scale)

  L1 Cache  β”‚β–ˆ                                           β”‚ 1ns      (4 cycles)
  L2 Cache  β”‚β–ˆβ–ˆβ–ˆβ–ˆ                                        β”‚ 4ns      (12 cycles)
  L3 Cache  β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ                                β”‚ 12ns     (40 cycles)
  RAM       β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ        β”‚ 100ns    (300 cycles)
  NVMe SSD  β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ 20,000ns  (60K cycles)
  SATA SSD  β”‚β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚ 100,000ns (300K cycles)
  
Image by Manfred Richter from Pixabay
logo EDB

Where CPU Still Matters

  • Complex query execution
  • JSON/JSONB processing
  • Encryption/decryption operations
  • Text search and pattern matching
  • Window functions and analytical queries
Image by Bruno from Pixabay
logo EDB

On the field priorities

  • Datamodeling
  • SQL query writing
  • PostgreSQL tuning
  • β†’ Training devs and users
Image by Karl Egger from Pixabay
logo EDB

Balancing Optimization Efforts

  • CPU: specific workloads
  • Memory layout: large datasets
  • Storage performance: highest ROI for all
  • Architectural changes: highest potential
logo EDB

Conclusion: PostgreSQL on Monster Hardware

  • NUMA awareness is coming
  • Process-to-thread might come
  • Storage I/O remains the true bottleneck
Image by Alexa from Pixabay
logo EDB

Conclusion: PostgreSQL on Monster Hardware

  • Vertical scaling with monster servers is simpler
  • Do not decide on assumed limitations
  • PostgreSQL continues to evolve
Image by Alexa from Pixabay
logo EDB

Questions?

Thank You!

Bluesky: @l-avrot.bsky.social

Linkedin: lætitia-avrot

Image by Fabrizio Brugnoni from Pixabay
logo EDB